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ABSTRACT 


With the assignment of the last available blocks of public IPv4 addresses from Internet As- 
signed Numbers Authority, there is continued pressure for widespread IPv6 adoption. Be- 
cause the IPv6 address space is orders of magnitude larger than the IPv4 address space, re- 
searchers need new methods and techniques to accurately measure and characterize growth 
in IPv6. This thesis focuses on IPv6 router infrastructure and examines the possibility of us- 
ing heuristic methods in order to discover IPv6 router interfaces. We consider two heuristic 
techniques in an attempt to improve upon current state-of-the-art IPv6 router infrastruc- 
ture discovery methods. The first heuristic examines the ability to generate candidate IPv6 
addresses by finding the most common lower 64 bit patterns among IPv6 router interface 
address observed in historical probing data. The second heuristic generates candidate IPv6 
addresses by assuming that an IPv6 address seen in historical probing data is one end of 
a point-to-point link, and uses the corresponding end’s IPv6 address. Using a distributed 
active topology measurement system, we test these heuristic methods on the IPv6 Internet. 
We find that our first heuristic is successful in discovering a non-trivial number of new 


router interfaces, while the second heuristic is more efficient. 
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CHAPTER 1: 


Introduction 





There are currently two types of Internet Protocol (IP) addresses assigned to network in- 
terfaces connected to the Internet. The first and most common type are Internet Protocol 
Version 4 (IPv4) addresses, which are 32-bit unsigned integers and are usually expressed in 
“dotted-decimal” notation (e.g., 74.125.196.147). The second type are Internet Protocol 
Version 6 (IPv6) addresses, which are 128-bit unsigned integers and are usually expressed 
in hexadecimal notation (e.g., 2607:£8b0:4002:c09::63). The address spaces repre- 
sented by both IPv4 and IPv6 are divided into IP allocation blocks by the Internet Assigned 
Numbers Authority (IANA). IANA will assign an allocation block to one of the five Re- 
gional Internet Registries (RIR) based on geographical location. The RIRs then provides 
sub-allocations out of their assigned IP allocation blocks to an entity. However, the last 
unallocated IPv4 blocks were allocated by IANA in February 2011 [1], [2]. With the in- 
ability of at least two of the five RIRs [3]-[5] to provide sub-allocations from their allocated 
IPv4 address blocks, the availability of globally routable IPv4 addresses is quickly running 
out. This exhaustion of IPv4 address space has put increasing pressure on Internet Service 
Providers (ISPs), content providers, organizations, and individuals to adopt IPv6 technolo- 
gies due to the larger address space and greater availability of IPv6 addresses. Major ISPs, 
e.g., Comcast, recently have been experiencing shortages of IPv4 addresses to assign to 
their customers. These major ISPs are slowly adopting IPv6 as the larger address space 
of IPv6 enables them to better cope with the IPv4 address shortages. As ISPs and content 
providers continue to adopt IPv6, it becomes more advantageous for the end user to adopt 
IPv6 due to the possibility that, in the future, certain content will only be available to end 
users via IPv6. 


The specifications for IPv6 were adopted in December 1998 and defined in Request for 
Comment (RFC) 2460. While IPv6 has been standardized for the past 15 years, it has 
not seen appreciable deployment until the late 2000s. There are several reasons that have 
caused IPv6 to not be widely adopted until recently. The primary reason for the lack of 
IPv6 adoption is that IPv6 is not backward compatible with IPv4. The lack of backward 
compatibility between IPv4 and IPv6 means that there is a need for increased complexity in 


the network and additional resources needed for purchasing required equipment upgrades 
to support IPv6. Additionally, the adoption of IPv6 has been slow due to security con- 
cerns. While IPv6 was engineered to address certain security issues found in IPv4 [6], it 
is believed that malicious actors will find ways exploit IPv6 as an attack vector. For exam- 
ple, researchers are already seeing malicious actors using IPv6 to bypass network security 
devices due to the lack of [Pv6 support or configuration in these devices (e.g., network 
firewalls, network management, etc.). Finally, as adoption of IPv6 continues, malicious 
actors will discover exploitation methods specific to [Pv6 and create new attacks using 
these newly discovered exploits [7], [8]. Thus, while the RFC specifies several motivations 
that lead to the creation of IPv6, the primary driver today is the exhaustion of usable IPv4 
address space [6]. 


Network Address Translation (NAT) technology provides an interim solution to the issue 
of IPv4 address exhaustion by extending the useful life of IPv4. As a result of NAT, wide- 
scale adoption of IPv6 has been slow [9]. Specifically, large-scale NAT technologies are 
being proposed by ISPs as an alternative to IPv6. These large-scale NAT technologies, 
often referred to as carrier-grade NAT, allow an ISP to use private IPv4 address space 
within that ISP’s internal network. This allows an ISP to share a single public IPv4 address 
among multiple subscribers [10]. However, NAT technologies do not solve the fundamental 
issue of IPv4 address space exhaustion. In fact, they introduce a new set of problems, 
including inhibiting end-to-end reachability, single points of failure in the network, and the 


requirement to maintain a large amount of network state [1], [8]. 


While the larger address space in IPv6 can support continued growth in the Internet, it also 
presents challenges to the efforts of researchers who are attempting to map and understand 
the topology of the IPv6 Internet. For example, exhaustive active scanning techniques that 
were feasible for IPv4 are not feasible given the size of the IPv6 address space. 


The primary goal of this thesis is to develop alternative and efficient methods to discover 
IPv6 infrastructure, specifically router interfaces. By improving upon current IPv6 infras- 
tructure discovery methods, we hope to enable better insight into the nature of the IPv6 


transition and more wholly understand the topology of IPv6. 


1.1 Motivation 

With the continued exponential growth of IPv6 since 2008, approximately a thirty-fold 
increase as observed by Google, one can conclude that IPv6 is becoming more widely 
adopted. In addition, researchers are continuing to see growth in the core of the network 
with respect to support for IPv6. However, it is still unclear how widespread the adoption 


is and where this growth is occurring [8], [11], [12]. 
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Figure 1.1: Google's IPv6 Adoption Statistics as of February 2015, from [11] 


To understand the need for alternative and more efficient ways to discover IPv6 infras- 
tructure, one must comprehend the size of the address space provided by IPv6 and the 
infeasibility of trying to probe all possible IPv6 addresses using current technologies. IPv6 
uses a 128-bit unsigned integer to indicate the address of an endpoint, providing IPv6 with 
218 approximately 3.4 x 10°°, possible unique addresses. In contrast, IPv4 uses 32-bit 
addresses, providing 2°”, or approximately 4.3 x 10°, possible unique IP addresses. Thus, 
IPv6 provides an address space almost thirty orders of magnitude larger than IPv4. Given 
that the IPv6 address space is orders of magnitude larger than IPV4, it is currently infeasible 


to actively probe all possible addresses in the IPv6 address space. 


For the purposes of discussion, assume that we have access to all the servers in a single data 
center (roughly 100,000 servers [13]) and that each server can probe IP addresses at a rate 


of 20 addresses per minute. Using these assumptions, it would take approximately 2,148 


minutes or just under 36 hours to probe the entire IPv4 address space. This is quite feasible 
and there has been at least one instance of a botnet operating in a similar fashion to conduct 
a complete scan of the IPv4 address space [14]. However, using the same assumptions to 
probe the entire IPv6 address space, it would take 1.7 x 10°? minutes or approximately 
3.2 x 1076 years to complete, which is an unrealistic timeframe. Thus, a current challenge 
faced by researchers and malicious actors alike is to find intelligent and efficient probing 


methods in IPv6 for discovering hosts and infrastructure. 


1.2. Research Questions 

The focus of this thesis is finding alternative and efficient methods to discover IPv6 in- 
frastructure. To narrow the scope of our research, we focus on the ability to use heuristic 
methods to discover IPv6 router interfaces. A heuristic is a form of problem solving that 
uses a practical methodology in order to find a sufficient solution to a problem in a reason- 
able amount of time when finding the optimal solution to the problem is either impossible 
or impractical. Examples of common heuristic methods include using a rule of thumb to 
solve a problem, making an educated guess, and using common sense. The optimal so- 
lution to the problem of discovering IPv6 infrastructure, in particular router interfaces, is 
to exhaustively probe the entire address space. However, this optimal solution has been 
shown to be impossible (Section 1.1). Therefore, we seek to show that by using heuristic 


techniques, we can discover IPv6 router interfaces in a reasonable amount of time. 


This thesis begins by using historical IPv6 probe data from the Center for Applied Internet 
Data Analysis (CAIDA) Archipelago Measurement Infrastructure (Ark) and a large scale 
Content Distribution Network (CDN) as inputs into our proposed heuristic techniques. The 
output from our heuristic techniques are candidate IPv6 addresses. We then actively probe 
the path to these candidate addresses, also using the Ark infrastructure, to determine the 


ability of our heuristics to discover new IPv6 router interfaces. 


In our research into the feasibility of using heuristic techniques to discover IPv6 router 


interfaces, we seek to answer the following questions: 


e Does historical data reveal patterns in IPv6 addressing via the host portion of an IPv6 
address? 


e If there are patterns in the historical data of IPv6 addressing, is it possible to leverage 


these patterns in order to discover previously unknown IPv6 router interfaces? 

e Do the discovered IPv6 router interfaces correspond to interfaces on previously 
known or new routers? 

e Assuming the existence of point-to-point links, is it possible to leverage this assump- 


tion in order to discover previously unknown IPv6 router interfaces? 


1.3 Contributions 


Our research efforts into the feasibility of using heuristic methods to discover IPv6 infras- 


tructure yielded the following findings: 


e Although the IPv6 address space is very large, there is low-entropy in the host bits 
of router IPv6 interface addresses. The host bit values of ::1 and ::2 account for 
almost a third of all host addresses observed in historical data. 

e A heuristic based probing approach can be successful in discovering a non-trivial 
number of new IPv6 router interfaces. 

e Performing Internet-wide probing using a heuristic method based off of the 10 most 
common host bits of an IPv6 address yielded the discovery of approximately 5,500 
previously unseen router interfaces. 

e Performing Internet-wide probing using a heuristic method based off of generating 
IPv6 addresses by inferring the existence of point-to-point links yielded the discov- 
ery of approximately 10,150 previously unseen router interfaces. Additionally, this 
heuristic was more efficient than our other heuristic method. This heuristic produced 
the maximum number of new router interfaces with the least amount of candidate 


IPv6 addresses probed. 


1.4 Thesis Structure 


The remainder of this thesis is organized as follows: 


e Chapter 2 discusses other IPv6 measurement and topology work, previous related 
work on using heuristics to discover IPv6 hosts, and IPv6 alias resolution techniques. 
e Chapter 3 outlines two heuristic methods that generate candidate IPv6 addresses and 


describes our methodology for large-scale probing of these candidates. 


e Chapter 4 provides results from our analysis of historical data on IPv6 addresses, 
results from probing using our most common lower-64 bit host heuristic and our 
point-to-point link heuristic. 

e Chapter 5 details our research conclusions and provides recommendations for future 


research areas related to this work. 





CoAPT ERD 
Background and Related Work 





With the current ongoing transition from IPv4 to IPv6, researchers and content providers 
are interested in measuring the deployment of IPv6. Various content providers and orga- 
nizations, including Akamai [15], Google [11], [16], and the U.S. government [17], all 
have web pages dedicated to providing near real-time statistics on the deployment of IPv6 
from their respective vantage points. In addition, researchers are actively conducting ex- 
periments and measurements to characterize the adoption, use, and evolution of IPv6 using 
a variety of metrics and techniques. This chapter reviews features of IPv6 that are relevant 


to this thesis, as well as describing related research. 


2.1 Overview of IPv6 


As discussed in Chapter 1, an IPv6 address is a 128-bit unsigned integer. Its string pre- 
sentation format is expressed in hexadecimal notation with the form x:x:x:x:x:x:x:x 
where each x represents 16-bits of the address as four hexadecimal values. In order to 
shorten the length of an IPv6 address, two shorthand notations have been adopted for IPv6. 
The first shorthand notation involves dropping all leading zeros in each sub-portion of 
an IPv6 address (e.g., 2607 :£8b0:4002:0c09: 0000: 0000: 0000:0063 is equivalent to 
2607 : £8b0: 4002: c09:0:0:0:63). The second shorthand notation uses “: :” to represent 
one variable length run of zeros (e.g., 2607 : £8b0 : 4002: 0c09: 0000 : 0000 : 0000 : 0063 is 
equivalent to 2607 :£8b0: 4002: c09: :63) [18]. For the purposes of our research, we de- 
fine the “host bits” as the 64 lower, or least significant, bits of the 128-bit address. We term 


the upper, or most significant, 64 bits of the address as the “network bits.” 


In order for a client, router or server to be able to communicate on the network via IPv6 
it first needs to be assigned a globally unique IPv6 address. There are three primary ways 
to assign an IPv6 address to a device. The first method is via Stateless Address Autocon- 
figuration (SLAAC). SLAAC allows the host to generate a unique IPv6 address with the 
network prefix provided in router advertisement messages. The host creates a unique set of 
host bits by using the Media Access Control (MAC) address of its interface. To form the 
full 64-bit host bits, SLAAC inserts the hexadecimal values OxFFFE in between the upper 


24 bits and the lower 24 bits of the MAC address [19]. The second method used for IPv6 
address assignment is Dynamic Host Configuration Protocol (DHCP) in which the host re- 
quests an IP address from a DHCP server running on the network. The DHCP server then 
assigns the host an IPv6 address to use; often this assigned address is the next available 
IPv6 address in a block of values predefined by the network administrator [20]. In the third 
method, the host is manually configured by the network administrator with an unused IPv6 
address. IPv6 address assignment is important to this thesis because it has a significant 


effect on our ability to develop heuristic techniques for intelligently probing IPv6. 


2.2 IPv6 Deployment Measurement Studies 

Significant prior research has sought to measure the deployment and growth of IPv6. One 
of the major challenges faced by IPv6 researchers is that many of the techniques developed 
for measuring IPv4 do not translate well, or at all, to IPv6 due to protocol differences and 
the much larger address space. As a result, researchers have developed new techniques and 
methods to accurately measure and characterize the growth in IPv6. The research discussed 
in this section focuses on IPv6 infrastructure deployment measurements; other research not 


discussed focuses instead on client adoption of IPv6. 


Previous research by researchers from CAIDA [8], [12], focused on using data from pub- 
licly available Border Gateway Protocol (BGP) datasets in order to characterize trends in 
the growth of IPv6 and compare the growth of IPv6 to the growth of IPv4. Their re- 
search showed that IPv6 is experiencing an exponential growth trend while IPv4 growth 
is currently increasing gradually and linearly. They believe that the gradual linear growth 
in IPv4 is associated with the exhaustion of the address space. At the time of their data 
collection, the majority of the growth observed in IPv6 was in the core of the network, 
driven primarily by transit and content providers. One specific hypothesis that CAIDA re- 
searchers wanted to address was whether the maturing IPv6 topology was becoming more 
or less congruent with the current IPv4 topology. They analyzed AS level path data over an 
eight year period to test their hypothesis and determined that the similarity between IPv4 
and IPv6 Autonomous System (AS) level paths increased from 10-20% to 40-50% during 
that timeframe. Thus, they showed that as IPv6 matures, it is becoming more congruent 


with the current IPv4 topology. 


While the researchers from CAIDA focused on BGP to measure growth in IPv6, research 
by Czyz et al. [21] took a broader view and examined BGP data, CAIDA traceroutes, traf- 
fic data from an ISP, and several other datasets in order to draw conclusions regarding the 
growth of IPv6. In their study, they focused on sixteen different metrics to measure the 
growth in IPv6. Specific metrics examined included the number of IPv6 address block al- 
locations, ability to resolve hostnames using the Domain Name System (DNS) and number 
of queries being made for DNS Quad-A Record (AAAA) resource records, and the current 
usage and traffic of IPv6 as viewed from an ISP. The researchers noticed orders of mag- 
nitude differences in the results from each metric, indicating that no one metric can at the 
moment accurately measure and characterize the growth in IPv6. However, they were able 
to conclude that IPv6 is experiencing a large amount of growth and that the performance of 


IPv6 in now comparable to that of IPv4. 


Older work from Xiao et al. examined the IPv6 AS-level topology [22]. They focused on 
studying IPv6 as a complex network and wanted to know if they could categorize IPv6 as 
a scale-free network. It should be noted that this research was done in 2009 before any 
major adoption of IPv6 had occurred. However, they were able to show IPv6 was indeed 
a scale-free network similar to IPv4 but that the topology of the network was less uniform 


than the topology in IPv4. 


The previously discussed IPv6 deployment measurement studies focused mainly on using 
historical BGP and Ark data to measure the growth in IPv6. Instead, our research fo- 
cuses on the ability to use historical active traceroute probing data and heuristic methods to 
conduct experimental probing of the IPv6 address space attempting to discover new IPv6 
router interfaces. If we are successful in determining the feasibility and effectiveness of 
using heuristic methods to discover new IPv6 router interfaces, then we believe that other 
researchers will be able to use our heuristic methods to improve their data collection tech- 
niques in IPv6. Thereby, increasing their ability to accurately measure and characterize the 


growth in IPv6. 


2.2.1 IPv6 Measurement Infrastructure 
With the ongoing transition to IPv6, and interest by many in measuring the transition, 


researchers require some form of dedicated measurement infrastructure. Ideally, this mea- 


surement infrastructure would be distributed, thereby providing multiple vantage points 
into the network and offering researchers autonomy and flexibility in their data collec- 
tion. An infrastructure able to collect data from multiple vantage points would also provide 
researchers a more representative sampling of the IPv6 network. Two such major infras- 


tructures have been used to measure IPv6 deployment. 


The first infrastructure currently being used to measure IPv6 deployment, and the infras- 
tructure we used in our research, is CAIDA’s Ark [23]. Ark was the evolution from 
CAIDA’s previous skitter-based measurement infrastructure. Ark uses the scamper pro- 
gram to perform topology probing in both IPv4 and IPv6. scamper provides researchers 
the capability to perform ping and traceroute network measurements; additionally, 
scamper provides support for Paris traceroute, Multi-path Detection Algorithm 
traceroute, and various alias resolution techniques [24]. As of February 2015, Ark con- 
sisted of 106 monitors, or vantage points, with 39 IPv6 capable monitors [25]. Currently, 
CAIDA’s Ark performs topology measurement in IPv6 by probing a random IPv6 address 
and the ::1 in every advertised BGP prefix from each vantage point per cycle of prob- 
ing [25]. The topology measurement or probe data contains traceroute information from 
a given vantage point to a destination address. This data contains the IPv6 addresses of 
router interfaces traversed during the traceroute, Round Trip Times (RTTs), and other 
data from the Internet Control Message Protocol Version 6 (ICMPv6) messages returned 
from the traceroute. In addition to Ark’s automatic collection of topology data, Ark 
allows researchers to use it in an on-demand mode. This on-demand mode allows a re- 
searcher to request Ark to perform either a ping or traceroute from a requested vantage 
point to a specified destination IP address. In our research, we test the ability of our heuris- 


tic methods to discover router interfaces by utilizing the topology on-demand mode of Ark. 


The second infrastructure that had been used to measure IPv6 deployment is BeiHang 
University National Lab of Software Development Environment (NLSDE)’s Dolphin [26]. 
Dolphin was developed solely to collect topology information and performance information 
in IPv6. Unlike CAIDA at the time, Dolphin could conduct near-real time measurements 
of IPv6. Dolphin used a modified version of traceroute to collect topology data for IPv6. 
However, it appears that this project ceased in 2010 and that CAIDA’s Ark is the only IPv6 


measurement infrastructure that is currently active and in use today. 
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2.3. Methods to Discover IPv6 Addresses 


One of the major issues faced by researchers in measuring IPv6 deployment and determin- 
ing the network topology in IPv6 is developing methods to intelligently probe the IPv6 
address space. While some researchers primarily focus on measuring and characterizing 
IPv6 growth, other researchers are working to develop new techniques to probe active por- 
tions of the IPv6 address space. This is a challenge given the size of the IPv6 address 


space. 


A study of insecurities in IPv6 by Heuse [27] proposed a method for probing in IPv6 
that we used for the basis of our heuristic method described in Section 3.2.1. Part of 
Heuse’s research was on the feasibility of performing remote alive probing in IPv6. During 
the course of his research, he realized that by combining information found from search 
engines, IPv6 address databases, and DNS records, he could possibly determine commonly 
used addresses in IPv6. Using data from various IPv6 databases and DNS records, he 
was able to determine that, from his dataset of unique IPv6 addresses, the vast majority 
of addresses (approximately 60-70%) shared common host addresses. Analyzing the host 
addresses, Heuse determined that if a host’s IPv6 address was either manually configured 
or provided from a DHCP server, he could leverage this information to brute force discover 
additional IPv6 addresses. Using this theory, he was able to brute force candidate IPv6 
addresses and successfully discovered new alive hosts. Our research seeks to perform a 


similar technique, but focuses instead on discovering router interface addresses. 


In their research Bellovin et al. postulated possible methods for worms to propagate in IPv6 
and divided these methods into local versus wide area propagation methods [28]. The local 
area methods of propagation primarily rely upon the ability of the worm to perform network 
reconnaissance using an infected host machine. These local propagation methods are not 
relevant to our research because researchers often do not have access to the remote networks 
they are probing. However, several of the wide area methods of propagation could form 
the basis for possible heuristic methods to intelligently probe the IPv6 address space. One 
method discussed the fact that IPv6 servers often have low-numbered addresses to enable 
easy memorization by system administrators. This method supports the work performed by 
Heuse and again leads us to believe that we can leverage this information to find a heuristic 


method to discover router interfaces. A second method suggested that an IPv6 worm could 
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perform a dictionary search of hostnames using DNS to collect candidate IPv6 addresses 
from the returned AAAA records. A third method proposed was that the worm could use 
peer-to-peer networks to learn IPv6 addresses of the hosts within the peer-to-peer network. 
To accomplish this, the worm would have to participate in the topology maintenance of 
the peer-to-peer network, watching and listening for responses to queries, and occasionally 


sending queries of its own in an attempt to learn host addresses. 


We utilize the techniques described by Heuse and Bellovin in our IPv6 router interface 


discovery work. 


2.4 IPv6 Alias Resolution 


Router alias resolution provides researchers another way to look at the topology of a net- 
work. While the focus of large scale active topology probing is to discover router interface 
addresses, alias resolution seeks to determine which interfaces belong to the same physical 
router. Thus, alias resolution permits researchers to infer the router-level topology of a 
network as opposed to the interface-level topology. 


Suppose one was to perform two traceroutes to the same destination from different vantage 
points. During the first traceroute, at some point along the path interface A is seen followed 
by interface C. On the second traceroute interface B is seen followed by interface C. Alias 
resolution seeks to show that interfaces A and B are actually different interfaces on the 


same router (i.e., aliases) and not interfaces on two different routers (see Figure 2.1). 


pleone 


Figure 2.1: Diagram of an Alias Resolution Instance [29] 


Keys [29] surveyed and discussed various methods for performing alias resolution in IPv4 
and the ability to use those methods to perform Internet-scale alias resolution. Keys cate- 


gorized these alias resolution techniques into two main categories: (a) fingerprinting tech- 


2 


niques and (b) analytical techniques. He defined fingerprinting techniques as those that 
send probe packets to different IPv4 addresses and use identifying characteristics from 
the responses to infer if the responses came from the same router or not. In general, fin- 
gerprinting techniques are more accurate for alias resolution but are dependent upon the 
routers being configured to respond to probe packets. Analytical techniques instead at- 
tempt to draw inferences about the underlying topology of a network by analyzing the IP 
address graph. Analytical techniques rely upon many assumptions and, as a result, are often 
less accurate than fingerprinting. Some of the well known and used IPv4 alias resolution 
techniques discussed by Keys included Ally, RadarGun, Analytic and Probe-based 
Alias Resolver (APAR) and kapar. However, none of these techniques can be used in 


IPv6 either because they have not been, or cannot be, adapted to IPv6. 


Currently, speedtrap is the only large-scale alias resolution technique for IPv6 [30]. The 
previously mentioned techniques for IPv4 alias resolution all rely on characteristics of IPv4, 
such as the identification (ID) field in the IPv4 header, that do not exist in IPv6. To develop 
IPv6 alias resolution techniques researchers needed to find unique IPv6 protocol features 
to exploit for the purposes of alias resolution, similar to [Pv4 fingerprinting-based alias 
resolution techniques. Researchers discovered that by forcing a router to perform packet 
fragmentation, the IPv6 fragmentation extension header could be used for the purpose of 
alias resolution. Normally, [Pv6 does not perform in-network packet fragmentation, instead 
placing the responsibility of fragmentation and reassembly on the end points. speedtrap 
performs alias resolution in IPv6 by inducing a router to send fragmented IPv6 packets from 
its control plane. speedtrap is then able to extract information from the fragmentation 
identification field in the fragmentation header to perform alias resolution using a finger- 
printing technique. The functionality of speedtrap has been implemented into scamper. 
We use speedtrap to perform IPv6 alias resolution on our newly discovered IPv6 router 


interfaces to provide insight into the router infrastructure we are finding via our heuristics. 


2.5 Subnet Inference via Router Topology Studies 

A third concept that is useful in determining the underlying topology of a network is the 
ability to infer subnet information about the network. Gunes et al. [31] studied the relation- 
ship between collected I[Pv4 addresses from path traces of a network and the ability to infer 


subnet information from the collected data. Allowing researchers the ability to infer subnet 
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information from collected path trace data providing them another means, like IP alias res- 
olution, to generate an accurate and complete topology map of a given network. The goal 
of subnet inference is to determine whether seemingly separate links discovered via path 
traces can be merged into their single hop representation (e.g., point-to-point, multi-access, 
etc.). In order to infer subnet relations, Gunes ef al. began by grouping IP addresses from 
the collected path trace data into candidate subnets based on the IP addresses having the 
same maximum x bit prefix. From this maximum /x subnet their technique would then 
recursively form increasingly smaller candidate subnets. These candidate subnet relation- 
ships next needed to be pruned in an attempt to correlate the inferred subnet relationships to 
the actual subnet relationships that exist in the Internet. Gunes ef al. proposed a set of four 
complementary conditions that assist in pruning down the candidate subnet relationships. 
We rely on some of the high-level concepts for inferring subnet information in a network 
discussed by Gunes ef al. to guide the development of our inferred subnet based heuristic 


technique for IPv6 router interface discovery (see Section 3.2.2). 


2.6 Recursive Subnet Inference (RSI) Probing Algorithm 
While there is currently active research in discovering more intelligent probing primitives 
for IPv6, there has been similar work in discovering IPv4 intelligent probing primitives. 
An example of an intelligent IPv4 probing primitive is the RSI algorithm. RSI was rooted 
in concepts from the Subnet Centric Probing (SCP) algorithm, but went in a new direction 
to overcome some of the limitations of SCP. In general, RSI works by performing a binary 
search tree over a given input prefix. RSI will use the input prefix to determine the probing 
search space and divide the search space in half. The algorithm will generate a candidate 
address for probing at the midpoint address in each half of the search space. Based on the 
results from the probe, RSI will decide whether to continue recursively dividing the search 
space in half and probing additional candidate IP addresses or terminate the search on that 


branch of the binary search tree [32]. 
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CHAPTER 3: 
Methodology 





As discussed previously in Section 1.1, it is feasible to probe all possible [Pv4 addresses 
to determine the network infrastructure in IPv4. However, in IPv6 such exhaustive prob- 
ing of the entire address space is unrealistic. Instead, we need more intelligent methods to 
discover IPv6 infrastructure and understand the topology in IPv6. One intelligent method 
of probing that is currently being researched is the RSI algorithm. Research into an IPv6 
version of RSI has yet to be successful but has provided additional insight into subnetting 
in IPv6 [33]. This study instead focuses on determining the feasibility of using heuristic 
methods to discover IPv6 router interfaces. We used heuristic techniques to generate can- 
didate IPv6 addresses for probing instead of performing a binary search in a given prefix to 


recursively generate candidate IPv6 addresses for probing. 


3.1 Datasets 


Our research into heuristic techniques for discovering IPv6 router interfaces utilized two 
unique datasets. The first set of data included all of CAIDA’s Ark IPv6 topology probing 
from the month of July from the years 2009 to 2014 [34]. General information summarizing 
this data is given in Table 3.1. Although this dataset was not used to generate candidate 
IPv6 addresses for probing, it did provide insight into how the distribution of the host bits 
has changed over a period of six years. Analysis on the historical distribution of the host 


bits can be seen in Figures 4.1 and 4.2 and Table 4.1 with further discussion in Section 4.1.1. 


The second set of data included all Ark topology probing results from January to August 
2014 and a set of IPv6 router interface addresses collected by a large CDN. Table 3.2 
summarizes this data. The data provided by the large CDN was only a list of IPv6 addresses 
and did not contain any information regarding the number of vantage points or number of 
traces used to generate the list. This second set of data was used to generate our list of 
candidate IPv6 addresses for experimental probing to test our hypotheses about our two 


heuristic techniques. 
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Name of Dataset Number of Number of Number of Number of 
Vantage Points | Traces Unique Router | Unique 
Interfaces Network 
Masks 
CAIDA Ark July 2009 8 195,678 6,372 3,008 
CAIDA Ark July 2010 10 331,968 9,342 4,282 
CAIDA Ark July 2011 27 2,245,170 24,980 10,903 
CAIDA Ark July 2012 26 3,503,595 39,630 17,716 
CAIDA Ark July 2013 31 14,055,506 68,037 34,252 
CAIDA Ark July 2014 35 17,044,334 76,452 36,637 





Table 3.1: CAIDA Ark IPv6 Topology Datasets from July 2009 to July 2014 
































Name of Dataset || Number of Number of Number of Number of 
Vantage Points | Traces Unique Router | Unique 
Interfaces Network 
Masks 

CAIDA Ark 
January to 

40 118,043,837 144,199 77,068 
August 2014 
CDN Unknown Unknown 51,327 21,108 
Combined Unknown Unknown 164,026 85,021 








Table 3.2: CAIDA Ark and CDN Datasets used to Determine Most Common Lower-64 Bits 


3.2 Heuristic-Driven Discovery 


3.2.1 Heuristic #1: Frequency of Lower-64 Bits 


Our first heuristic method is based off of the research previously performed by Heuse as 
discussed in Section 2.3. The intuition for this heuristic is that the host bits of IPv6 ad- 


dresses associated with router interfaces have low-entropy. Because of this low-entropy, 


there exists a set of more commonly used host bits. Low-entropy in the host bits is fre- 


quently due to IPv6 addresses being statically assigned by network administrators in such 


a way as to ease network management. Often the assigned IPv6 address will be an address 


that is easily numbered and remembered by a human and facilitates association. Our hy- 
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pothesis is that we can use this non-uniform distribution of host bits as a heuristic to more 
intelligently probe and discover IPv6 router interfaces. As a reminder, we define the “host 
bits” as the 64 lower, or least significant, bits of the 128-bit address. We term the upper, or 


most significant, 64 bits of the address as the “network bits.” 
The common host bits heuristic requires two distinct steps: 


1. Empirically gathering common IPv6 router interface host bits. For this, we analyze 
historical IPv6 probing data from CAIDA’s Ark measurement infrastructure and IPv6 
addresses collected from a large CDN. 

2. Generate candidate IPv6 addresses, based on the previously determined most com- 


mon lower-64 or host bits, for use in experimental probing. 


Determining Most Common Lower-64 bits of IPv6 Addresses 
First, we examine the general distribution of IPv6 router interface host bits. If the host bits 
are uniformly distributed, then this heuristic method will not be a useful technique. To this 


end, we examine the Ark and CDN datasets. 


The pseudo-code for our algorithm to determine the most common lower-64 bits of an IPv6 
address can be seen in Algorithm 1. We first find the set of unique IPv6 addresses parsed 
from our datasets, while also filtering out addresses within any of the special use ranges in 
IPv6. Filtered IPv6 special use ranges we filtered included multicast, link and site local, 
private address space, and IPv6 6to4. From this set of unique IPv6 addresses, we extract 
the lower-64 bits of each IPv6 address and maintain a count for each unique lower-64 bit 
value. We then rank the lower-64 bits in order of decreasing frequency of occurrence. 


Generating Candidate Addresses to Probe 

The top N lower-64 bit values from this sorted list are used to generate valid candidate 
IPv6 addresses for experimental probing. As shown in Algorithm 2, we obtain a set of 
unique IPv6 network masks from the set of globally advertised IPv6 BGP prefixes in Route- 
views [35]. For each advertised prefix, we form a candidate address by combining the ad- 
vertised BGP prefix (regardless of size) with one of the most common host bit value as the 
lower-64 bits of the address. As an example, give the BGP prefix 2a00:1b00:: anda most 


common host bit value of ::1:1 we would combine them together to get the following 
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Algorithm 1: Histogram of Lower-64 Host Bits Among Set of IPv6 Addresses 
Input: /nter faces 
Output: Lower 


Unique ~— 0 
Lower|| — 0 


for i € Inter faces do 
if (i ¢ Unique) \ (i ¢ Special) then 
Unique = Unique U {i} 
host = (i& (2 — 1)) 
Lower|host] = Lower|host] +1 


candidate IPv6 address of 2a00:1b00::1:1. 


We create candidate probing lists for the top 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 150, 
200, 250, 300, 350, 400, 450, and 500 most common lower-64 bit values. Note that this 
method of generating candidate IPv6 addresses could contain a subset of the IPv6 addresses 
already present in the historical data. In Section 4.1.2, we address this issue in our analysis 


of the experimental probing data. 


Algorithm 2: Algorithm to Generate Candidate IPv6 Addresses for Experimental Probing 
Input: BGPPre fixes 

Input: MostCommonLower 

Output: TargetProbeAddresses 


for i © BGPPre fixes do 
for 7 © MostCommonLower do 
lL TargeProbeAddress = BGPPre fix|i| || MostCommonLower|j| 


3.2.2 Heuristic Method #2: Inferring via /126 Point-to-Point Links 

The second heuristic examines the possibility of using historical IPv6 probing data in order 
to infer the existence of point-to-point links. The intuition for this heuristic is that point- 
to-point links are used in IPv4 to connect one router to another router and are assigned the 
smallest subnet necessary. Figure 3.1 provides a diagram for what we refer to as a point- 


to-point link in this research. Router A has an interface connected directly to an interface 
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on Router B. 


Router A Router B 


Sines 


Figure 3.1: Diagram of a Point-To-Point Link Instance 


In IPv4, point-to-point links are usually /30 or /31 [36]. Based on the existence of point- 
to-point links and their usage in connecting routers in IPv4, we posit IPv6 routers will be 
similarly connected. We hypothesize that given an IPv6 address and the assumption that the 
address is one end of a point-to-point link on a /126 subnet, we can discover new topology 
by probing the complementary end of the point-to-point link. 


The pseudo-code to determine the corresponding IPv6 address assuming a /126 point- 
to-point subnet is given in Algorithm 3. We take each IPv6 address from our historical 
data and determine if we have not seen that address before and that it is not in any of the 
special use ranges in IPv6, using the same steps as discussed in Section 3.2.1. Given a 
unique global address, we take the IPv6 address and divide by four, which is the number of 
unique addresses in a /126 subnet. Based on the value of the remainder from the division 
operation, we either add or subtract one from the IPv6 address. This operation provides the 
IPv6 address corresponding to the other end on a given /126 point-to-point link. We store 
both the original IPv6 address and its point-to-point complement (ensuring no duplicate 
IPv6 addresses are stored). Once we have exhausted the IPv6 addresses from the datasets 
we generated our candidate I[Pv6 addresses for probing. To generate our candidate IPv6 
addresses, we take the set of stored original IPv6 addresses and the calculated point-to- 
point complement IPv6 addresses and removed the set of IPv6 addresses from the original 


datasets. 


3.3. Experimental Probing 
Once we generated our candidate lists of IPv6 addresses to probe based on both heuristic 
techniques. We used CAIDAs Ark Topology on Demand (ToD) service to probe each of the 


candidate IPv6 addresses [37]. To send our probe requests into the Ark infrastructure, we 


19 


Algorithm 3: Algorithm to Infer /126 Point-to-Point Links in IPv6 
Input: Jnter faces 
Output: TargetProbeAddresses 


Unique ~— 0 
PointtoPoint —@ 


for i € Interfaces do 

if (i ¢ Unique) \ (i ¢ Special) then 

Unique = Unique U {i} 

PointtoPoint = PointtoPoint U {i} 

if imod4 == | then 

| PointtoPoint = PointtoPoint U {i+ 1} 


else if imod4 == 2 then 
| PointtoPoint = PointtoPoint U{i—1} 











Target ProbeAddress = PointtoPoint \ Unique 


feed as input into todclient our candidate list of IPv6 addresses. The results from each set 
of probing from the Ark infrastructure was stored into an output file for later analysis. By 
performing our experimental data collection using Ark we were able to conduct probing 
from various vantage points around the world using scamper’s implementation of IPv6 
paris-traceroute. In our experimental probing we used 16 IPv6 vantage points, of 
these 9 were located in North America, 5 were located in Europe, 1 was located in Asia 
and | was located in Oceania. Prior to conducting our experimental probing we ensured all 
16 vantage points were up and operational. After conducting the probing we verified that if 
we issued X number of traces we had X number of results before moving on to the analysis 
of the data. In an effort to reduce the probing load on Ark, we limited our probing rate to a 


maximum of 500 probe requests being processed by Ark at any given time. 


3.4 Performing Alias Resolution on Experimental Results 
In order to provide deeper insight into the results of our experimental probing, we per- 
formed alias resolution to determine how much new infrastructure we our discovering when 


probing our candidate addresses. Alias resolution allows us to determine whether newly- 
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discovered interfaces are merely different interfaces on previously discovered routers (i.e., 
interfaces previously unknown that belong to a known router) or are new interfaces on pre- 
viously unknown routers. To perform alias resolution on our collected data, we generate an 
input file of IPv6 addresses that contain all the unique IPv6 addresses from the historical 


data and all newly-discovered IPv6 addresses from our probing. 


This list of addresses is used as input into scamper’s implementation of the speedtrap 
alias resolution technique, previously discussed in Section 2.4. The output from the alias 
resolution is pairs of IPv6 addresses that are different interfaces on the same physical router. 
We take each pair of aliased IPv6 addresses and convert them into a listing of all IPv6 
addresses associated with a given router. The results from our alias resolution analysis can 
be found in Section 4.1.2 for Heuristic #1 and Section 4.2 for Heuristic #2. 
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CHAPTER 4: 


Experimental Results 





In Chapter 3, we introduced and discussed two potential heuristic methods for intelligently 
discovering IPv6 router interfaces. This chapter initially discuss the results of our histori- 
cal analysis of the most commonly used lower-64 bits in router IPv6 addresses. Next, we 
discuss the results of our live network probing using the heuristic methods to generate can- 


didate IPv6 addresses. Finally, we compare the relative performance of the two heuristics. 


4.1 Analysis of Heuristic #1: Frequency of Lower-64 Bits 

The intuition for this heuristic method was based on the fact that IPv6 addresses associ- 
ated with router interfaces often have low-entropy due to manual configuration by network 
administrators. Our analysis of this heuristic method is divided into two separate sections; 
in the first one, we will discuss our analysis regarding the frequency of the lower-64 bit 
values of router IPv6 addresses from historical data. In the second section we will discuss 
the results of our experimental network probing based on the most common host bit values 
of an IPv6 address. 


4.1.1 Analysis of the Lower-64 Bits in IPv6 Addresses 

Before we were able to generate candidate IPv6 addresses based on the most common host 
bit values and test our hypothesis, we needed to show that there was indeed low-entropy in 
the host bits associated with router interfaces in IPv6. We also sought to determine what 
host bit values occurred more frequently than others. We initially began our analysis by 
observing the frequency in which host bit values occurred based on CAIDA Ark probing 
data as collected in the month of July over a six year period. This allowed us to determine 
if certain host bit values occur more frequently than others and if so, how they changed 


over a Six year period. 


Figure 4.1 summarizes the observed behavior in the frequency of host bit values from the 
month of July from the period of 2009 to 2014. We observed that a very small number of 
unique host bit values accounted for approximately 60% of all the host bit values observed 


in the datasets. However, due to this behavior of host bit values, the data we are most 
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interested in is compressed against the y-axis in Figure 4.1. Figure 4.2 adjusts the plot axes 
to focus in on the area of interest. By focusing on the area of interest near the y-axis, we 
concluded that over the six years of data that about 100 unique host bit values accounted 


for approximately 50-60% of all the host bit values observed in the datasets. 
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Figure 4.1: Cumulative Distribution of Historical Lower-64 Bits of IPv6 Address from CAIDA 
Datasets from July 2009 to July 2014 


Table 4.1 contains the top 10 most common host bit values from the CAIDA July 2009 and 
CAIDA July 2014 datasets. In both datasets the host bit values of ::1 and : :2 represented 
on average about 35% of all host bit values in the CAIDA data. The next most common host 
bit values on average individually comprised less than 1% of all host bit values contained 
in the data. It is also clear from our analysis that over time the most common host bit values 
do not vary much each year; for the most part the most common host bit values in July 2009 
were the most common host bit values in July 2014. A final observation is that the majority 
of the top 30 most common host bit values seem to use only the eight least significant host 


bits in an IPv6 address. 


Additionally, we observed that each dataset shown in Figure 4.1 has an inflection point in 
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Figure 4.2: Cumulative Distribution of Historical Lower-64 Bits of IPv6 Address from CAIDA 
Datasets from July 2009 to July 2014 (Zoomed In) 






























































CAIDA July 2009 Dataset CAIDA July 2014 Dataset 
Top # || IPv6 Frequency of | Percentage of | IPv6 Frequency of | Percentage of 

Host Host Bits Dataset Host Host Bits Dataset 

Bits Bits 

1 2 1,516 23.79% 2, 13,627 17.82% 
2 il 849 13.32% se | 13,429 17.57% 
3 16 91 1.43% 3 1,774 2.32% 
4 ao 70 1.10% 6 602 0.79% 
3 oa 66 1.04% oe) 475 0.62% 
6 ee) 64 1.00% ) 437 0.57% 
7 212 47 0.74% 712 325 0.46% 
8 oS 45 0.71% 4 336 0.44% 
9 20 44 0.69% 211 307 0.40% 
10 16 39 0.61% 29 307 0.40% 





Table 4.1: Top Ten Lower-64 bits of an IPv6 Address from CAIDA Datasets from July 2009 and 


July 2014 


Zo 





the curvature of the graph around 60% to 70% of all the host bit values observed. These 
inflection points become more pronounced each year. While we did not pursue any further 
investigation regarding the significance of these inflection points. We hypothesize that 
the reason these inflection points are becoming more pronounced each year is because 
of the growth of IPv6. Coupled with the growth of IPv6 is the need to add additional 
IPv6 infrastructure to the network. The increase in IPv6 infrastructure would require the 
addition of new routers and router interfaces in the network. As router interfaces are added 
into the network, one must assign a unique IPv6 address to the interface. We surmise that 
as network administrators address these new router interfaces they first will do so using 
addresses from the set of common host bit values. However, once they have used up the 
common host bit values, they begin to assign address to interfaces using another addressing 
scheme. This addressing scheme appears to be different for each network based on the 
presence of the tail in each graph. 


To remove the potential bias due to examining traceroute probe data from a single source, 
we additionally analyzed data from a large CDN and compared it to CAIDA’s data. Fig- 
ures 4.3 and 4.4 both summarize the observed behavior in the frequency of host bit values 
from the CAIDA 2014 and CDN datasets. The behavior observed in these two datasets 
is very similar to the behavior observed in our earlier analysis. We observed that a small 
number of unique host bit values comprised 60% of all the host bit values, and that each 


graph has a distinct inflection point. 


We then analyzed the combined data sets in order to obtain the most representative view 
of IPv6 router addressing. Table 4.2 contains the 10 most common host bit values from 
the combined CAIDA and CDN datasets. Once again we observed that the most common 
host bit values are ::1 and ::2, accounting for 31% of all the unique host bit values. 
Similar to our earlier analysis from above, the next most common host bit values on average 


individually comprised less than 1% of all the host bit values. 


From our analysis on the frequency of host bit values, we concluded that there is indeed 
low-entropy in the host bits associated with IPv6 router interfaces. Due to the low-entropy 
we were able to show the existence of a set of more commonly used host bit values used 
to address IPv6 router interfaces. With this knowledge, we were then able to perform 


experimental testing of a heuristic method that uses the most common host bit values to 
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Figure 4.3: Cumulative Distribution of Historical Lower-64 Bits of IPv6 Address from Combined 
CAIDA and CDN Datasets 


generate candidate IPv6 addresses for probing. 


4.1.2 Analysis of Experimental Probing Results 

The preceding analysis found an inflection point in the distribution of router host addresses 
where approximately 50% of all addresses use one of 500 different host bit values. We 
therefore use the combined data from the Ark topology probing results from January to 
August 2014 and a large CDN to determine the 500 most common host bit values used by 
IPv6 router interfaces. Next, we combined the 500 most common host bits with the19,441 
advertised BGP prefixes (as of September 2014) to generate our candidate IPv6 addresses 
used for out experimental probing. While we could have conducted the experimental prob- 
ing by probing all 500 most common host bit values at once, we broke the probing down 
into smaller sets of probing. We created this subdivision for two reasons: first, it allowed 
for more granularity in the results allowing us to better observe the effect of increasing the 


number of most common host bit values probed to the number of router interfaces discov- 
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Figure 4.4: Cumulative Distribution of Historical Lower-64 Bits of IPv6 Address from Combined 
CAIDA and CDN Datasets (Zoomed In) 


ered. Second, by splitting our probing into multiple rounds we reduced the workload on 
the Ark infrastructure and limited the impact on our probing if one of the monitors failed 


during a round of probing thereby causing us to restart that round of probing again. 


Our experimental probing sets were divided such that for the top 100 most common host bit 
values we would probe the top 10 most common host bit values appended to the advertised 
BGP prefixes, then we would probe the top 11-20 most common host bit values appended 
to the advertised BGP prefixes, and so forth. Once we finished the top 100 most common 
host bit values, our probing technique changed slightly such that we then probed the 101- 
150 most common host bit values, followed by the 151-200 most common host bit values, 


and so forth until we conducted probing for all 500 most common host bit values. 


Once we completed our experimental probing, we began our analysis by creating a list 
containing the unique IPv6 address hops observed in the collected traceroute data. To 


generate this list of unique IPv6 addresses observed, we parsed the IPv6 address for each 
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CAIDA 2014 and CDN Dataset 

Top # || IPv6 Frequency of | Percentage of 

Host Host Bits Dataset 

Bits 

iF ai 27,306 16.65% 
2 oe 25,376 15.47% 
3 ao 2,519 1.54% 
4 116 980 0.60% 
=) a 839 0.51% 
6 aS 835 0.51% 
7 n4 642 0.39% 
8 ell 573 0.35% 
9 12 563 0.34% 
10 ao 55 0.34% 























Table 4.2: Top Ten Lower-64 bits of an IPv6 Address from Combined CAIDA and CDN Datasets 


hop in the traceroute output using a similar methodology as the one used for processing 
the historical CAIDA Ark topology data. By then removing the IPv6 address that we orig- 
inally observed in the CAIDA and CDN data from our list of unique IPv6 addresses, we 
are able to determine the new IPv6 router addresses discovered as a result of our heuris- 
tic based experimental probing. These newly discovered IPv6 router addresses include 
both the probing target IPv6 addresses and the intermediate router address seen on the 


traceroute path. 


The results of our experimental probing using the top 500 most common host bit values 
is shown in Figure 4.5. From the top 10 most common host bit values we discovered 
approximately 5,500 new router interface addresses. Additionally, as we increased the 
number of most common host bit values probed, we continue to see a gradual increase in the 
number of new router interfaces. However, there was a single anomaly in our experimental 
data in which we observed a large jump in the number of interfaces discovered. This large 
jump in our results occurred between the top 300 and top 350 most common host bit values. 
This anomaly was most likely caused by a several month gap in our experimental probing 
caused by multiple failures in the Ark infrastructure that required us to restart our probing 


of the top 301-350 most common host bits after each failure. 
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Figure 4.5: Number of New Interfaces Discovered using Heuristic #1 


Using the speedtrap alias resolution technique, we conducted alias resolution using the 
164,026 unique IPv6 addresses from our combined data set along with the 18,077 new 
router interfaces discovered from our heuristic. We found that 17% of the newly discovered 
router interfaces from our probing were interfaces on previous unseen router infrastructure. 
Another 2% of the newly discovered router interfaces were interfaces on previously seen 
router infrastructure. The remaining 81% consisted of previously discovered router inter- 


faces on previously seen router infrastructure. 


In our analysis of the experimental probing results, we wanted to see how the probing order 
impacts the rate of new router interfaces discovered. To answer this question, we investigate 
three ordering strategies: i) decreasing popularity (e.g., Top 1-10 host bits, followed by Top 
11-20 host bits, followed by Top 21-30 host bits, etc.); ii) increasing popularity (e.g., Top 
251-300 host bits, followed by Top 201-250 host bits, followed by Top 151-200 host bits, 
etc.); and ili) random. Figure 4.6 plots the rate of new interfaces discovered according 


to each of these orderings. For this portion of our analysis, we only considered the top 
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300 most common host bit values to avoid tainting our analysis with the large jump in 
new interfaces discovered due to the several month gap in experimental probing. From 
Figure 4.6 we observe that there is a significant effect on the initial rise of newly discovered 
router interfaces by selecting the Top 300 host bits in increasing order of popularity vice 
decreasing order of popularity. However, there seems to be no significant effect on the 
initial rise when comparing the randomly chosen order of popularity to the decreasing 
order of popularity. In general, we would expect the number of new interfaces discovered 
by randomly choosing the order of popularity to have as an upper bound the number of 
new interfaces discovered by decreasing order of popularity and have as a lower bound 
the number of new interfaces discovered by increasing order of popularity. These results 
suggest that further investigation into the effect of probing order on topology discovery is 


warranted. 
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Figure 4.6: Effects of Popularity Order on Number of New Interfaces Discovered using Heuristic 


#1 


Finally, we sought to determine the fraction of newly discovered router addresses that were 


intermediate hops along the path versus the target itself (since our targets are presumably 
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router interfaces). To do this we first examined the fraction of target IPv6 addresses re- 
sponding to our probe request. For the Top 10 most common host bit values appended to a 
given BGP prefix only 6.5% of the 194,420 target addresses probed responded to the probe 
request. As the number of Top VN most common host bit values increased the fraction of re- 
sponding target addresses steadily decreased. Next, we looked at what fraction of the new 
interfaces discovered were the target. Of the 6.5% of target addresses that responded to the 
probe request, about 5.6% were new interfaces that were discovered. Table 4.3 contains 
the percentages of target addresses that responded to our probe requests and the percent of 
those that did respond that are newly discovered interfaces for the Top 100 most common 


host bit values. 


















































Top # || Number | Percentage Percentage 
of Tar- | of Target | of Target 
get IPv6 | Addresses Addresses that 

Ad- Responding to | Responded 
dresses Probe to Probe that 
are Newly 

Discovered 

Interfaces 

10 194,420 6.52% 5.66% 
20 388,840 4.19% 8.41% 
30 583,260 3.26% 9.96% 
40 777,680 2.68% 11.09% 
50 972,100 2.30% 12.05% 
60 || 1,166,520 2.05% 12.87% 
70 || 1,360,940 1.86% 13.88% 
80 |} 1,555,360 1.71% 14.72% 
90 || 1,749,780 1.58% 15.35% 
100 || 1,944,200 1.46% 15.77% 





Table 4.3: Percentages of Top 100 Target Addresses that Responded to Probing and are Newly 


Discovered Interfaces 
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4.2 Analysis of Heuristic Method #2: Inferring via /126 
Point-to-Point Links 


The intuition for this heuristic method is based on the assumption that point-to-point links 
are used in IPv6 to connect routers to each other and that network administrators often 
will assign these point-to-point links the smallest subnet necessary. To test this heuristic, 
we used the same dataset used to test our first heuristic method. By using the same input 
data for both heuristics, we can meaningfully compare the ability of each heuristic method 
to discover router interfaces. Using our heuristic as described in Section 3.2.2, we gener- 
ate 127,748 candidate IPv6 addresses for use in our experimental probing. We analyzed 
the probing results and found that we had discovered 10,157 new IPv6 router interface 


addresses. 


While we were able to discover a non-trivial number of new interfaces via this heuristic 
method, we performed additional research regarding the subnet sizes associated with IPv6 
point-to-point links. Our research suggests that there does not yet appear to be a standard 
subnet size associated with point-to-point links in IPv6. Some of the literature suggests 
using a /127 subnet for point-to-point links [38]; other literature suggests using a /64 for 
point-to-point links [39]. The effectiveness of this heuristic at discovering new router in- 
terfaces could be improved by additional research using different subnet sizes for inferring 


the endpoint IPv6 addresses for a given point-to-point link. 


Using the speedtrap alias resolution technique, we conducted alias resolution using the 
164,026 unique IPv6 addresses from our combined data set along with the 10,157 new 
router interfaces discovered from our heuristic. We found that 16% of the newly discovered 
router interfaces from our probing were interfaces on previous unseen router infrastructure. 
Another 2% of the newly discovered router interfaces were interfaces on previously seen 
router infrastructure. The remaining 82% consisted of previously discovered router inter- 


faces on previously seen router infrastructure. 


As before in Heuristic #1, we sought to determine whether our experimental probing was 
discovering new router interfaces at our target probing address or were we the new inter- 
faces discovered simply new intermediate router interfaces. To do this we first looked at 


what fraction of the target addresses probed to responded to our probe request. Of the 
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127,748 target addresses probed about 40.7% of those target addresses responded to the 
probe request. Next we looked at what fraction of the new interfaces discovered were tar- 
get address that responded to our probe request; in our experiential probing 31.7% of the 


new interfaces we discovered were the target address used for probing. 


4.3, Comparison of Heuristic Methods 

In this section, we compare our two heuristic methods and their ability to discover new IPv6 
router infrastructure. While each heuristic method did yield a non-trivial number of router 
interfaces discovered, each method required a significant amount of experimental probing. 
One way to compare these two heuristic methods is to evaluate them by the relative measure 
of number of new interfaces discovered to the number of candidate IPv6 addresses probed. 
For the first heuristic method, using the top 500 most common host bits it was able to 
discover 18,773 router interfaces but required experimental probing of 9,720,500 candidate 
IPv6 addresses, a ratio of 0.002. However, if we consider only the top 10 most common host 
bits for the first heuristic we see significant improvements in the number of new interfaces 
discovered compared to the number of candidate IPv6 addresses probed. In this case, the 
heuristic was able to discover 5,532 router interfaces with only probing 194,410 candidate 
IPv6 addresses, a ratio of 0.028. The second heuristic method instead was able to discover 
10,157 router interfaces while requiring only 127,748 candidate IPv6 addresses, a ratio of 
0.080. 


Overall, the second heuristic method was more effective at discovering the maximum num- 
ber of new router interfaces with the least amount of candidate IPv6 addresses probed. Both 
heuristic methods performed equally as well with regards to the alias resolution results and 


the discovery of previously unseen router infrastructure. 
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CHAPTER 5: 


Conclusion 





This thesis investigated the feasibility of using heuristic techniques to efficiently discover 
router infrastructure in IPv6. While we considered numerous possible methods to study, 


our research focused on two heuristics. 


The first heuristic method relied upon finding a set of the most commonly used lower-64 
bits in IPv6 router interface addresses and appending these most common lower-64 bit 
values to all advertised BGP prefixes to generate a list of candidate IPv6 addresses for 
probing. We show that, even though there are approximately 1.84 x 10!° possible lower-64 
bit values, only a small number of these are used in the deployed Internet as inferred from 
our historical data. Additionally, we observed that this set of commonly used lower-64 
bit values remained fairly constant over a six year period. From our experimental probing 
using the top 500 most common host bit values, we were able to discover a non-trivial 
amount of previously undiscovered IPv6 router infrastructure. By probing only the top 10 
most common host bit values, this heuristic yielded the largest number of new IPv6 router 


interfaces discovered in a single round of experimental probing. 


The second heuristic method relied on the assumption that point-to-point links in IPv6 use 
/126 subnets. Similar to our results from the first heuristic, we again were able to discover 
a non-trivial amount of IPv6 router interfaces from our experimental probing. However, 
unlike our first heuristic method this method discovered the greatest number of IPv6 router 


interfaces with the least amount of experimental probing. 


In conclusion, we showed that simple heuristic techniques are a feasible and effective so- 


lution to the problem of discovering router infrastructure in IPv6. 


5.1 Future Work 


This section presents suggestions for future work that will build upon the starting point of 


our research into using heuristic methods for discovering router interfaces in IPv6. 
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5.1.1 Research into Other Heuristic Methods 


While we only studied two heuristics in this thesis, additional research into other heuristic 


techniques needs to be performed: 


e One possible heuristic method that could be studied involves completing the se- 
quence between known IPv6 address. As an example, assume that the following IPv6 
addresses exist 2001:500:3: :42, 2001:500:3::45, and 2001:500:3::46. We 
could logically assume that there may exist network devices that would respond to 
probing at the following two IPv6 addresses 2001 :500:3: :43 and 2001:500:3: :44. 
Barnes et al. have previously conducted research into this sequence completion 
heuristic [40]. Their work from 2012 showed they had limited success discover- 
ing IPv6 infrastructure using a sequence completion heuristic. With the exponential 
growth currently being experienced in IPv6 we recommend that the sequence com- 
pletion heuristic should be reinvestigation. 

e Another heuristic that could be studied involves searching DNS records associated 
with known IPv6 router interfaces and looking for patterns in the hostnames assigned 
to the router interfaces. We could then use the observed patterns in the hostnames 
of router interfaces to query for associated AAAA DNS records that may return 
candidate IPv6 addresses that would respond to probing. As an example, given 
the following IPv6 address of 2001:1900:29::a corresponding to a router inter- 
face. Performing a reverse DNS lookup with the given IPv6 address yields a DNS 
PTR record of v1-5.car1.phoenix1.level3.net.. The returned PTR record in- 
dicates several possible patterns used when providing the hostname to this particular 
router interface. In this example, we could try requesting the AAAA DNS record for 
vl-5.car2.phoenixi.level3.net.. The returned AAAA DNS record provides 
a candidate IPv6 address of 2001:1900:29::e that may respond to experimental 
probing. 

e Finally, a technique that generates candidate IPv6 addresses for probing by partici- 
pating in peer-to-peer networks, as suggested in the Bellovin et al. [28], may reveal 


previously unknown infrastructure. 
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5.1.2 Integration of Heuristic #1 into CAIDAs Ark 

As discussed in Section 2.2.1, CAIDAs Ark currently only probes a random IPv6 address 
in a given BGP prefix per round of probing. We suggest that CAIDA, in addition to their 
current method of probing the IPv6 address space, add probing for the top 10 most common 
lower-64 bit values into each round of probing. Based on our results, we believe that 
this additional probing will provide additional useful topology data without incurring a 


significant amount of overhead in time or processing to complete a round of probing. 
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